Optimal Sparse Segment Identification with Application in Copy Number Variation Analysis.
نویسندگان
چکیده
Motivated by DNA copy number variation (CNV) analysis based on high-density single nucleotide polymorphism (SNP) data, we consider the problem of detecting and identifying sparse short segments in a long one-dimensional sequence of data with additive Gaussian white noise, where the number, length and location of the segments are unknown. We present a statistical characterization of the identifiable region of a segment where it is possible to reliably separate the segment from noise. An efficient likelihood ratio selection (LRS) procedure for identifying the segments is developed, and the asymptotic optimality of this method is presented in the sense that the LRS can separate the signal segments from the noise as long as the signal segments are in the identifiable regions. The proposed method is demonstrated with simulations and analysis of a real data set on identification of copy number variants based on high-density SNP data. The results show that the LRS procedure can yield greater gain in power for detecting the true segments than some standard signal identification methods.
منابع مشابه
Robust Detection and Identification of Sparse Segments in Ultra-High Dimensional Data Analysis.
Copy number variants (CNVs) are alternations of DNA of a genome that results in the cell having a less or more than two copies of segments of the DNA. CNVs correspond to relatively large regions of the genome, ranging from about one kilobase to several megabases, that are deleted or duplicated. Motivated by CNV analysis based on next generation sequencing data, we consider the problem of detect...
متن کاملBIRC5 Genomic Copy Number Variation in Early-Onset Breast Cancer
Background: Baculoviral inhibitor of apoptosis repeat-containing 5 (BIRC5) gene is an inhibitor of apoptosis that expresses in human embryonic tissues but it is absent in most healthy adult tissues. The copy number of BIRC5 has been indicated to be highly increased in tumor tissues; however, its association with the age of onset in breast cancer is not well understood. Methods: Forty tumor tiss...
متن کاملOn Bottleneck Product Rate Variation Problem with Batching
The product rate variation problem minimizes the variation in the rate at which different models of a common base product are produced on the assembly lines with the assumption of negligible switch-over cost and unit processing time for each copy of each model. The assumption of significant setup and arbitrary processing times forces the problem to be a two phase problem. The first phase determ...
متن کاملO-27: Genome Instabilities in Preimplantation Development Leading to Genetic Variation between Tissues of Normal Human Fetuses
Background: Origin of midlife copy number variations (CNVs) between tissues in non-genetic diseases is unknown. Such genomic differences caused by post-zygotic events. They might either happen during the life or due to prevalent mosaicism in preimplantation stage. We aim to explore fetal mosaicism and its origins. Materials and Methods: Two apparently normal fetuses were achieved following the ...
متن کاملIdentification and Prioritization of Accident-Prone Segments using International Roughness Index
During last decades, owing to the increase in a number of vehicles, the rate of accident occurrence grows significantly. Efforts must be made to provide efficient tools to prioritize segments requiring safety improvement and identify influential factors on accidents. This objective of the research was to determine the safety oriented threshold of International Roughness Index (IRI) to recognize...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Journal of the American Statistical Association
دوره 105 491 شماره
صفحات -
تاریخ انتشار 2010